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□ 1. Document ID: US 20020106742 Al 

L3 : Entry 1 of 9 File: PGPB Aug 8, 2002 

PGPUB- DOCUMENT -NUMBER: 20020106742 
PGPUB- FILING -TYPE : new 

DOCUMENT- IDENTIFIER: US 20020106742 Al 

TITLE: Nucleic acids encoding active and inactive CCR5 chemokine receptors 
PUBLICATION-DATE: August 8, 2 002 
INVENTOR- INFORMATION : 

NAME CITY STATE COUNTRY RULE-47 



Samson, Michel Gentilly FR 

Parmentier, Marc Linkebeek BE 

Vassart, Gilbert Brussels BE 

Libert, Frederick Braine-L ' Alleud BE 



US -CL- CURRENT: 435 / 69.51 ; 435 / 320.1 , 435/325, 435/5, 514/44, 530/350, 536/ 23.5 



ABSTRACT : 

A peptide has an amino acid sequence having more than 80% homology with the amino 
acid sequence listed as SEQ ID NO : 4 . A nucleic acid molecule has more than 80% 
homology with one of the nucleic acid sequences listed as SEQ ID NO:l, SEQ ID NO: 2 
and SEQ ID NO:3. Ligands , anti-ligands , cells vectors relating to the peptide and/or 
nucleic acid molecule are also used. 
KWIK: Invalid display element. 
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□ 2. Document ID: US 20020094536 Al 

L3 : Entry 2 of 9 File: PGPB Jul 18, 2002 

PGPUB -DOCUMENT -NUMBER : 20020094536 
PGPUB -FILING -TYPE: new 

DOCUMENT- IDENTIFIER: US 20020094536 Al 

TITLE: Methods for making polynucleotide libraries, polynucleotide arrays, and cell 
libraries for high- throughput genomics analysis 

PUBLICATION-DATE: July 18, 2 002 



INVENTOR- INFORMATION : 



List Display http://westbrs:8002toii^ 



NAME 

Lofquist , Alan 
Finney, Robert E. 
Leung, David 



CITY 



STATE 



COUNTRY 



RULE -47 



Seattle 
Seattle 
Seattle 



WA 



WA 



WA 



US 



US 



us 



US -CL- CURRENT: 435/6; 435 / 287.2 , 435 / 320.1 



ABSTRACT : 

A method for high- throughput , genomics analysis, to identify the therapeutic or 
diagnostic utility of genes, entails the use of a construct to disrupt a gene or 
alleles of a gene in cells of interest. Arrays of such cells can be used to monitor 
such disrupted cells phenotypically in the context, for example, of testing drug 
candidates. Polynucleotides that comprise part of the disrupted genes can be 
recovered from such "knockout" cells, by virtue of an origin of replication or a 
host cell selection marker sequence that is part of the construct. The recovered 
polynucleotides can be used to identify the disrupted genes or to make homologous 
recombination vectors, which in turn can be employed to make multi -allele knockout 
cells . 

KWIK: Invalid display element. 



US-PAT-NO: 6331388 

DOCUMENT- IDENTIFIER: US 6331388 Bl 
TITLE: Immune response enhancer 
DATE- ISSUED: December 18, 2001 

INVENTOR- INFORMATION : 

NAME CITY STATE ZIP CODE COUNTRY 

Malkovsky; Miroslav Madison WI 

Wells; Andrew D. Mt . Laurel NJ 

US -CL- CURRENT: 435/5; 424 / 278.1 , 435/375, 435 / 69.1 , 435 / 7.21 , 435 / 7.22 , 435 / 7.23 , 
435 / 7.24 , 435 / 7. 31 , 435/ 7.32 , 514/44 



The present invention provides methods for specifically increasing expression of MHC 
class I molecules in cells, and in particular, in poorly immunogenic tumor cells as 
well as in pathogen-infected cells. Also provided by the present invention are 
methods for increasing presentation of endogenous antigens onto the cell surface by 
MHC class I molecules, as well as methods of increasing the immunity of an animal 
against an antigen. The methods presented herein are useful in enhancing immune 
recognition of any cell infected with any pathogen, for in vitro and in vivo 
screening of candidate immunogene therapeutic approaches, and for enhancing the 
generation of antibodies to an otherwise poorly immunogenic antigen or cell. The 
present invention further provides methods for reducing or increasing the radiation 
sensitivity of a cell. 




□ 3. Document ID: US 6331388 Bl 

L3 : Entry 3 of 9 



File: USPT 



ABSTRACT : 



14 Claims, 92 Drawing figures 
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Exemplary Claim Number: 1 
Number of Drawing Sheets: 61 
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□ 4. Document ID: US 6191268 Bl 

L3 : Entry 4 of 9 



File: USPT 



US-PAT-NO: 6191268 

DOCUMENT- IDENTIFIER: US 6191268 Bl 

TITLE: Compositions and methods relating to DNA mismatch repair genes 
DATE -ISSUED: February 20, 2 001 



INVENTOR- INFORMATION : 
NAME 

Liskay; Robert M. 
Bronner; C. Eric 
Baker; Sean M. 
Bo 1 lag; Roni J. 
Kolodner; Richard D. 



CITY STATE 

Lake Oswego OR 

Portland OR 

Portland OR 

Martinez GA 

Jamaica Plain MA 



ZIP CODE 



COUNTRY 



US - CL - CURRENT : 536 / 23.5 ; 536 / 24.3 , 536 / 24.31 , 536 / 24 .33 
ABSTRACT: 

Genomic sequences of human mismatch repair genes are described, as are methods of 
detecting mutations and/or polymorphisms in those genes. Also described are methods 
of diagnosing cancer susceptibility in a subject, and methods of identifying and 
classifying mismatch-repair-defective tumors. In particular, sequences and methods 
relating to human mutL homologs, hMLHl and hPMSl genes are provided. 

8 0 Claims, 16 Drawing figures 
Exemplary Claim Number: 4 
Number of Drawing Sheets: 25 

KWIK: Invalid display element. 
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□ 5. Document ID: US 6165713 A 

L3 : Entry 5 of 9 



File: USPT 



US-PAT-NO: 6165713 

DOCUMENT- IDENTIFIER: US 6165713 A 

TITLE: Composition and methods relating to DNA mismatch repair genes 
DATE -ISSUED: December 26, 2 000 
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INVENTOR- INFORMATION : 
NAME 

Liskay; Robert M. 
Bronner; C. Eric 
Baker; Sean M. 
Bollag; Roni J. 
Kolodner; Richard D. 



CITY STATE 

Lake Oswego OR 

Portland OR 

Portland OR 

Martinez GA 

Jamaica Plain MA 



ZIP CODE 



COUNTRY 



US - CL - CURRENT : 435/6; 435/7. 1, 435 / 91.1 , 435 / 91.2 , 536 / 24.33 



ABSTRACT : 



Genomic sequences of human mismatch repair genes are described, as are methods of 
detecting mutations and/or polymorphisms in those genes. Also described are methods 
of diagnosing cancer susceptibility in a subject, and methods of identifying and 
classifying mismatch-repair-defective tumors. In particular, sequences and methods 
relating to human mutL homologs, hMLHl and hPMSl genes are provided. 



55 Claims, 12 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 22 

KWIK: Invalid display element. 



□ 6. Document ID: US 5922855 A 

L3 : Entry 6 of 9 



File: USPT 



US-PAT-NO: 5922855 

DOCUMENT -IDENTIFIER: US 5922855 A 

TITLE: Mammalian DNA mismatch repair genes MLH1 and PMS1 
DATE- ISSUED: July 13, 1999 



INVENTOR- INFORMATION : 






NAME 


CITY 


STATE 


Liskay; Robert M. 


Lake Oswego 


OR 


Bronner; C . Eric 


Portland 


OR 


Baker; Sean M. 


Portland 


OR 


Bollag; Roni J. 


Martinez 


GA 


Kolodner; Richard D. 


Jamaica Plain 


MA 



ZIP CODE 



COUNTRY 



US -CL- CURRENT: 536 / 23.5 ; 536/ 24.3 , 536 / 24.31 , 536 / 24.33 



ABSTRACT : 



We have discovered two human genes, hMLHl and hPMSl, each of which apparently 
encodes for a protein involved in DNA mismatch repair. The hMLHl gene encodes for a 
protein which is homologous to the bacterial DNA mismatch repair protein MutL, and 
is located on human chromosome 3p21.3-23. We believe that mutations in the hMLHl 
gene cause hereditary non-polyposis colon cancer (HNPCC) in some individuals based 
upon the similarity of the hMLHl gene product to the yeast DNA mismatch repair 
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protein MLH1, the coincident location of the hMLHl gene and the HNPCC locus on 
chromosome 3, and hMLHl missense mutations in affected individuals from a chromosome 
3 -linked HNPCC family. The human hPMSl gene is homologous to the yeast DNA mismatch 
repair gene PMS1, and is located on human chromosome 7q. We believe that the hPMSl 
gene is a strong candidate for HNPCC testing because the yeast proteins MLH1 and 
PMS1 have been shown to be involved in the same DNA repair pathway and because hMLHl 
and hMSH2 have both been implicated in HNPCC families. The most immediate use for 
hMLHl and hPMSl will be in screening tests on individuals who are members of 
families which exhibit high frequencies of early onset cancer. We have also isolated 
and sequenced mouse MLH1 and PMS1 genes. We have produced chimeric mice with a 
mutant form of the PMS1 gene that will enable us to derive mice that are 
heterozygous or homozygous for mutation in mPMSl. These mice will be useful for 
cancer research. We have also produced and isolated antibodies directed to hPMSl 
which are useful in assays to detect the presence of protein in tumor samples. 

3 Claims, 16 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 17 

KWIK: Invalid display element. 



□ 7. Document ID: US 5807732 A 

L3: Entry 7 of 9 File: USPT 

US -PAT-NO: 5807732 

DOCUMENT- IDENTIFIER: US 5807732 A 

TITLE: GDP-L-fucose : . beta . -D-galactoside 2 -. alpha . -L-f ucosyltransf erases , DNA 
sequences encoding the same, method for producing the same and a method of 
genotyping a person 

DATE -ISSUED: September 15, 1998 
INVENTOR- INFORMATION : 

NAME CITY STATE ZIP CODE COUNTRY 



Lowe; John B. Ann Arbor MI 4 8105 

Lennon; Gregory Castro Valley CA 94552 

Rouquier; Sylvie 34000 Montpellier FR 

Giorgi; Dominique 34000 Montpellier FR 

Kelly; Robert J. Trenton MI 4 8183 



US -CL- CURRENT: 435/358; 435/193, 435 / 252 .2 , 435 / 252 .3 , 435 / 320.1 , 435/325, 435/365, 
435 / 69.1 , 536 / 23.2 



ABSTRACT : 

The gene encoding GDP-L-fucose: . beta . -D-Galactoside 2 -. alpha . -L-f ucosyltransf erase 
has been cloned, and a mutation in this gene has been found to be responsible for an 
individual being a non-secretor . 

12 Claims, 3 0 Drawing figures 
Exemplary Claim Number: 9 
Number of Drawing Sheets: 23 

KWIK: Invalid display element. 
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□ 8. Document ID: US 5703048 A 

L3 : Entry 8 of 9 



File: USPT 



US-PAT-NO: 5703048 

DOCUMENT- IDENTIFIER : US 5703048 A 

TITLE: Protection against liver damage by HGF 

DATE-ISSUED: December 30, 1997 

INVENTOR- INFORMATION : 

NAME CITY STATE ZIP CODE COUNTRY 

Roos; Filip Brisbane CA 

Schwall; Ralph Pacifica CA 

US -CL- CURRENT: 514/12; 435/360, 514/2, 514/838, 514/893, 514/894, 530/350, 530/399 



The present invention provides methods for preventing occurrence or progression of 
liver damage using hepatocyte growth factor. In the methods, a preventatively 
effective amount of the hepatocyte growth factor is administered to the patient. The 
hepatocyte growth factor can be administered, for instance, prior to administering a 
hepatotoxic therapy to the patient. The hepatocyte growth factor can further be 
administered with activin or transforming growth factor-beta to prevent liver 
damage. Compositions comprising hepatocyte growth factor and activin antagonist or 
transforming growth factor-beta antagonist are also provided by the invention. 

15 Claims, 9 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets : 5 

KWIK: Invalid display element. 



US-PAT-NO: 5654404 

DOCUMENT- IDENTIFIER: US 5654404 A 

TITLE: Protection against liver damage by HGF 

DATE- ISSUED: August 5, 1997 

INVENTOR- INFORMATION : 



ABSTRACT: 





□ 9. Document ID: US 5654404 A 

L3 : Entry 9 of 9 



File: USPT 
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NAME CITY STATE ZIP CODE COUNTRY 

Roos; Filip Brisbane CA 

Schwall; Ralph Pacifica CA 

US -CL- CURRENT: 530 / 387,3 ; 424 / 134 , 1 , 424 / 136.1 , 424 / 178 . 1 , 530 / 350 
ABSTRACT : 

The present invention provides methods for preventing occurrence or progression of 
liver damage using hepatocyte growth factor. In the methods, a preventative ly 
effective amount of the hepatocyte growth factor is administered to the patient. The 
hepatocyte growth factor can be administered, for instance, prior to administering a 
hepatotoxic therapy to the patient. The hepatocyte growth factor can further be 
administered with activin or transforming growth factor-beta to prevent liver 
damage. Compositions comprising hepatocyte growth factor and activin antagonist or 
transforming growth factor-beta antagonist are also provided by the invention. 

18 Claims, 9 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 5 

KWIK: Invalid display element. 
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COPYRIGHT (C) 2002 BIOLOGICAL ABSTRACTS INC. (R) 

=> s (predict? (3a) structure) and (protein or polypeptide) 

L1 8344 (PREDICT? (3A) STRUCTURE) AND (PROTEIN OR POLYPEPTIDE) 

=> s 11 and review/ dt 
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L4 ANSWER 1 OF 96 MEDLINE 

AN 2002087866 MEDLINE 

DN 21674677 PubMed ID: 11814876 

TI Molecular modelling in structural biology. 

AU Forster Mark J 

CS Informatics Laboratory, National Institute for Biological Standards and 

Control, Blanche Lane, South Mimms, Hertfordshire, UK.. 

mf oster@nibsc .ac.uk 
SO MICRON, (2002) 33 (4) 365-84. Ref : 154 

Journal code: 9312850. ISSN: 0968-4328. 
CY England: United Kingdom 
DT Journal; Article; (JOURNAL ARTICLE) 
General Review; (REVIEW) 

(REVIEW, ACADEMIC) 
LA English 
FS Priority Journals 
EM 200205 

ED Entered STN : 20020130 

Last Updated on STN: 20020509 
Entered Medline: 20020508 

AB Molecular modelling is a powerful methodology for analysing the three 

dimensional structure of biological macromolecules . There are many ways in 
which molecular modelling methods have been used to address problems in 
structural biology. It is not widely appreciated that modelling methods 
are often an integral component of structure determination by NMR 
spectroscopy and X-ray crystallography. In this review we consider some of 
the numerous ways in which modelling can be used to interpret and 
rationalise experimental data and in constructing hypotheses that can be 
tested by experiment. Genome sequencing projects are producing a vast 
wealth of data describing the protein coding regions of the 
genome under study. However, only a minority of the protein 
sequences thus identified will have a clear sequence homology to a known 
protein. In such cases valuable three-dimensional models of the 
protein coding sequence can be constructed by homology modelling 
methods. Threading methods, which used specialised schemes to relate 
protein sequences to a library of known structures, have been 
shown to be able to identify the likely protein fold 

even in cases where there is no clear sequence homology. The number of 
protein sequences that cannot be assigned to a structural class by 
homology or threading methods, simply because they belong to a previously 
unidentified protein folding class, will decrease in 

the future as collaborative efforts in systematic structure determination 

begin to develop. For this reason, modelling methods are likely to become 

increasingly useful in the near future. The role of the blind prediction 

contests, such as the Critical Assessment of techniques for 

protein Structure Prediction (CASP) , will be 

briefly discussed. Methods for modelling protein -ligand and 




protein-protein complexes are also described and 
examples of their applications given. 

L4 ANSWER 2 OF 96 MEDLINE 

AN 2002178153 MEDLINE 

DN 21909381 PubMed ID: 11911887 

TI Functional plasticity of CH domains. 

AU Gimona Mario; Dj inovic-Carugo Kristina; Kranewitter Wolfgang J; Winder 
Steven J 

CS Department of Cell Biology, Institute of Molecular Biology, Austrian 

Academy of Sciences, Salzburg, Austria.. mgimona@serverl . imolbio . oeaw. ac . a 
t 

SO FEBS LETTERS, (2002 Feb 20) 513 (1) 98-106. Ref : 55 

Journal code: 0155157. ISSN: 0014-5793. 
CY Netherlands 

DT Journal; Article; (JOURNAL ARTICLE) 
General Review; (REVIEW) 

(REVIEW, TUTORIAL) 
LA English 
FS Priority Journals 
EM 200205 

ED Entered STN: 20020326 

Last Updated on STN: 20020508 

Entered Medline: 20020507 
AB With the refinement of algorithms for the identification of distinct 

motifs from sequence databases, especially those using secondary 

structure predictions, new protein modules 

have been determined in recent years. Calponin homology (CH) domains were 
identified in a variety of proteins ranging from actin 

cross-linking to signaling and have been proposed to function either as 
autonomous actin binding motifs or serve a regulatory function. Despite 
the overall structural conservation of the unique CH domain fold 

the individual modules display a quite striking functional variability. 
Analysis of the actopaxin/parvin protein family suggests the 
existence of novel (type 4 and type 5) CH domain families which require 
special attention, as they appear to be a good example for how CH domains 
may function as scaffolds for other functional motifs of different 
properties . 



L4 ANSWER 3 OF 96 MEDLINE 

AN 2002101763 MEDLINE 

DN 21674960 PubMed ID: 11814598 

TI Structural proteomics: developments in structure-to-function 

predictions . 
AU Norin Martin; Sundstrom Michael 

CS Biovitrum, Department of Structural Chemistry., Stockholm, Sweden. 
SO TRENDS IN BIOTECHNOLOGY, (2002 Feb) 20 (2) 79-84. Ref: 50 

Journal code: 8310903. ISSN: 0167-7799. 
CY England: United Kingdom 
DT Journal; Article; (JOURNAL ARTICLE) 
General Review; (REVIEW) 

(REVIEW, TUTORIAL) 
LA English 
FS Priority Journals 
EM 200204 

ED Entered STN: 20020209 

Last Updated on STN: 20020412 
Entered Medline: 20020410 

AB The major challenge for post -genomic research is to functionally assign 

and validate a large number of novel target genes and their corresponding 
proteins. Functional genomics approaches have, therefore, gained 
considerable attention in the quest to convert this massive data set into 
useful information. One of the crucial components for the functional 
understanding of unassigned proteins is the analysis of their 
experimental or modeled 3D structures. Structural proteomics initiatives 



are generating protein structures at an unprecedented rate but 

our current knowledge of 3D-structural space is still limited. Estimates 

on the completeness of the 3D- structural coverage of proteins 

vary but it is generally accepted that only a minority of the structural 

proteome has a template structure from which reliable conclusions can be 

drawn. Thus, structural proteomics has set out to build a map of 

protein structures that will represent all protein 

folds included in the 'global proteome'. 



L4 ANSWER 4 OF 96 MEDLINE 
AN 2002074954 MEDLINE 
DN 21661098 PubMed ID: 11802435 
TI GTOP: database for protein 3D structure 
prediction. 

AU Kawabata T; Nishikawa Ktakawaba@lab.nig.ac.jp 

SO TANPAKUSHITSU KAKUSAN KOSO . PROTEIN, NUCLEIC ACID, ENZYME , (2 001 Dec) 46 

(16 Suppl) 2592-7. Ref : 12 

Journal code: 0413762. ISSN: 0039-9450. 
CY Japan 

DT Journal; Article; (JOURNAL ARTICLE) 
General Review; (REVIEW) 

(REVIEW, TUTORIAL) 
LA Japanese 
FS Priority Journals 
EM 200202 

ED Entered STN: 20020125 

Last Updated on STN: 20020227 
Entered Medline: 20020226 

L4 ANSWER 5 OF 96 MEDLINE 

AN 2001374922 MEDLINE 

DN 21324818 PubMed ID: 11430986 

TI Structure- -function characterization of cellulose synthase: relationship 

to other glycosyltransf erases . 
AU Saxena I M; Brown R M Jr; Dandekar T 

CS Section of Molecular Genetics and Microbiology, School of Biological 
Sciences, University of Texas at Austin, Austin, TX 78712, USA. 

SO PHYTOCHEMI STRY , (2001 Aug) 57 (7) 1135-48. Ref: 48 
Journal code: 0151434. ISSN: 0031-9422. 

CY United States 

DT Journal; Article; (JOURNAL ARTICLE) 
General Review; (REVIEW) 

( REVI EW , TUTOR I AL ) 
LA English 
FS Priority Journals 
EM 200109 

ED Entered STN: 20010924 

Last Updated on STN: 2 0010 924 
Entered Medline: 20010920 

AB A combined structural and functional model of the catalytic region of 
cellulose synthase is presented as a prototype for the action of 
processive beta-glycosyltransf erases and other glycosyltransf erases . A 285 
amino acid segment of the Acetobacter xylinum cellulose synthase 
containing all the conserved residues in the globular region was subjected 
to protein modeling using the genetic algorithm. This region 
folds into a single large domain with a topology exhibiting a 
mixed alpha/beta structure. The predicted 

structure serves as a topological outline for the structure of 
this processive beta-glycosyltransf erase . By incorporating new 
site-directed mutagenesis data and comparative analysis of the conserved 
aspartic acid residues and the QXXRW motif we deduce a number of 
functional implications based on the structure. This includes location of 
the UDP- -glucose substrate-binding cavity, suggestions for the catalytic 
processing including positions of conserved and catalytic residues, 
secondary structure arrangement and domain organization. Comparisons to 




cellulose synthases from higher plants (genetic algorithm based model for 
cotton CelAl) , data from neural network predictions (PHD) , and to the 
recently experimentally determined structures of the non-processive SpsA 
and beta 4 -galactosyltransf erase retest and further validate our 
structure- function description of this glycosyltransf erase . 

L4 ANSWER 6 OF 96 MEDLINE 

AN 2001640253 MEDLINE 

DN 21548623 PubMed ID: 11689334 

TI Taking a functional genomics approach in molecular medicine. 
AU Yaspo M L 

CS Max Planck Institute for Molecular Genetics, Ihnestrasse 73, D- 14195, 

Berlin, Germany. . yaspo@molgen.mpg .'de 
SO Trends Mol Med, (2001 Nov) 7 (11) 494-501. Ref : 70 

Journal code: 100966035. ISSN : 1471-4914. 
CY England: United Kingdom 
DT Journal; Article; (JOURNAL ARTICLE) 
General Review; (REVIEW) 
(REVIEW, TUTORIAL) 
LA English 
FS Priority Journals 
EM 200201 

ED Entered STN: 20011107 

Last Updated on STN: 20020129 
Entered Medline: 20020128 

AB The elucidation of genetic components of human diseases at the molecular 
level provides crucial information for developing future causal 
therapeutic intervention. High- throughput genome sequencing and systematic 
experimental approaches are fuelling strategic programs designed to 
investigate gene function at the biochemical, cellular and organism 
levels. Bioinf ormatics is one important tool in functional genomics, 
although showing clear limitations in predicting ab initio gene 
structures, gene function and protein folds 

from raw sequence data. Systematic large-scale data-set generation, using 
the same type of experiments that are used to decipher the function of 
single genes, are being applied on entire genomes. Comparative genomics, 
establishment of gene catalogues, and investigation of cellular and tissue 
molecular profiles are providing essential tools for understanding gene 
function in complex biological networks. 

L4 ANSWER 7 OF 96 MEDLINE 

AN 2001420985 MEDLINE 

DN 21363291 PubMed ID: 11470603 

TI Structural genomics: opportunities and challenges. 
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AB Following the complete genome sequencing of an increasing number of 

organisms, structural biology is engaging in a systematic approach of 

high- throughput structure determination called structural genomics to 

create a complete inventory of protein folds/ 

structures that will help predict functions for all 

proteins. First results show that structural genomics will be 




highly effective in finding functional annotations for proteins 
of unknown function. 
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AB The three-dimensional structures of LG/LNS domains from neurexm, the 
laminin alpha 2 chain and sex hormone -binding globulin reveal a close 
structural relationship to the carbohydrate -binding pentraxins and other 
lectins. However, these LG/LNS domains appear to have a preferential 
ligand-interaction site distinct from the carbohydrate -binding sites found 
in lectins, and this interaction site accommodates not only sugars but 
also steroids and proteins. In fact, the LG/LNS domain 

interaction site has features reminiscent of the antigen-combining sites 
in immunoglobulins. The LG/LNS domain presents an interesting case in 
which the fold has remained conserved but the functional sites 
have evolved; consequently, making predictions of 
structure- function relationships on the basis of the lectin 
fold alone is difficult. 
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AB Fold assignments for newly sequenced genomes belong to the most 

important and interesting applications of the booming field of 

protein structure prediction. We present a 

brief survey and a discussion of such assignments completed to date, using 

as an example several fold assignment projects for 

proteins from the Escherichia coli genome. This review focuses on 

steps that are necessary to go beyond the simple assignment projects and 

into the development of tools extending our understanding of functions of 

proteins in newly sequenced genomes. This paper also discusses 

several problems seldom addressed in the literature, such as the problem 



of domain prediction and complementary predictions (e.g., transmembrane 
regions and flexible regions) and cross-correlation of predictions from 
different servers. The influence of sequence and structure 
database growth on prediction success is also addressed. 
Finally, we discuss the perspectives of the field in the context of 
massive sequence and structure determination projects, as well as the 
development of novel prediction methods. 
Copyright 2 001 Academic Press. 
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AB The microbial rhodopsins (MR) are homologous to putative chaperone and 

retinal -binding proteins of fungi. These proteins 

comprise a coherent family that we have termed the MR family. We have used 

modeling techniques to predict the structure of one of 

the putative yeast chaperone proteins, YR02 , based on homology 

with bacteriorhodopsins (BR) . Availability of the structure allowed 

depiction of conserved residues that are likely to be of functional 

significance. The results lead us to predict an extracellular 

protein folding function and a transmembrane proton 

transport pathway. We suggest that protein folding is 

energized by a novel mechanism involving the proton motive force. We 

further show that MR family proteins are distantly related to a 

family of fungal, animal and plant proteins that include the 

human lysosomal cystine transporter (LCT) of man (cystinosin) , mutations 

in which cause cystinosis. Sequence and phylogenetic analyses of both the 

MR family and the LCT family are reported. Proteins in both 

families are of the same approximate size, exhibit seven putative 

transmembrane alpha-helical spanners (TMSs) and show limited sequence 

similarity. We show that the LCT family arose by an internal gene 

duplication event and that TMSs 1-3 are homologous to TMSs 5-7. Although 

the same could not be demonstrated statistically for MR family members, 

homology with the LCT family suggests (but does not prove) a common 

evolutionary pathway. Thus, TMSs 1-3 and 5-7 in both LCT and MR family 

members may share a common origin, accounting for their shared structural 

features . 
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AB Ab initio protein structure prediction 

methods have improved dramatically in the past several years. Because 

these methods require only the sequence of the protein of 

interest, they are potentially applicable to the open reading frames in 

the many organisms whose sequences have been and will be determined. Ab 

initio methods cannot currently produce models of high enough resolution 

for use in rational drug design, but there is an exciting potential for 

using the methods for functional annotation of protein sequences 

on a genomic scale. Here we illustrate how functional insights can be 

obtained from low-resolution predicted structures 

using examples from blind ab initio structure 

predictions from the third and fourth critical assessment of 

structure prediction (CASP3, CASP4) experiments. 

Copyright 2001 Academic Press. 
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AB In this review are presented the last new results of our research group 
dealing with the molecular structures (atomic level) of tropoelastin, 
elastin and elastin derived peptides studied by using essentially methods 
of bioinformatics (theoretical predictions and molecular modelling) linked 
to experimental circular dichroism spectroscopic studies. We already had 
characterized both the local secondary structure and some parts of the 
tertiary structure of the tropoelastin and elastin molecules (human, 
bovine...), by using either theoretical predictions (local 
secondary structure, linear epitopes...) and/or experimental 
data (optical spectroscopic methods: Raman scattering, infrared 
absorption, circular dichroism) . Except the cross -linking regions which 




are in helical conformations, the whole tropoelastin structure displays a 
lot of beta-reverse turns which usually belong to irregular structures m 
proteins. These turns play a key role in other regularly 
structures orientation (alpha-helix, beta-strand) , thus they are very 
important in the native protein 3D architecture. It is 

particularly true for human tropoelastin, because its sequence is rich in 
glycines and prolines, and these residues are frequently met in beta- turns 
(a beta-turn is made of four consecutive residues which are stabilized by 
an hydrogen bond) . Several types of beta-turns can be defined with the 
dihedral angles values phi and psi of the two central residues. Thus, by 
using a very recent updated set of propensities for the amino acid 
residues to belong to given types of reverse beta-turns (extracted from a 
reference set of known 3-D structures of globular proteins), we 
have determined, (by using our home made software COUDES) , for all 
possible tetrapeptides of the human tropoelastin sequence, the 
distribution and the characterization of the possible type of turns. Thus, 
it is shown that the locations and/or the types of these reverse 
beta-turns reveal a regularity and are not all random. This confirms our 
hypothesis that intra-molecular elasticity of tropoelastin could be 
explained by the possibility of transitions between conformations 
involving short beta-strands and beta-turns. This result is of great 
interest in the construction (by using molecular biology) of elastic 
biomaterials derived from the elastin sequence (particularly, the elastin 
derived peptides corresponding to the sequence exon 21--(exon 24--exon 
24...). Our study permit also to predict the conformations of specific 
elastin derived peptides which could have interesting biological activity. 
Peptides resulting from the degradation of elastin, the insoluble polymer 
of tropoelastin and responsible for the elasticity of vertebrate tissues, 
can induce biological effects and notably the regulation of matrix 
metalloproteinases (MMP-s) activity. Recently, it was proposed that some 
elastin derived hexapeptides resulting from circular permutations of 
VGVAPG (a three fold repetition sequence in exon 24 of human 
tropoelastin) possess MMP-1 production and activation regulation 
properties. This effect depends on the presence of the tropoelastin 
specific membraneous receptor 67 KDa EBP (Elastin Binding Protein 
) . Our results obtained by using both circular dichroism spectroscopy and 
linear predictions confirmed the hypothesis of a structure dependent 
mechanism with a possibly occurring type VIII beta-turn on the first four 
residues of the GXXPG sequence consensus which is only present among all 
active peptides. Thus, we have performed extensive molecular dynamics 
studies, in both implicit and explicit solvent, on these active and 
inactive elastin derived hexapeptides. Using our own analysis method of 
pattern recognition of the types of the beta-reverse-turns followed during 
the molecular dynamics trajectory, we found that active and inactive 
peptides effectively form two well distinct conformational groups in which 
active peptides preferentially adopt conformation close to type VIII GXXP 
(beta-reverse- turn. The structural role of the C terminal G residue could 
also be explained. Additional molecular simulations on (VGVAPG) 2 and 
(VGVAPG) 3 show the formation of two or three GXXP tetrapeptides adopting a 
structure close to type VIII beta-reverse-turn, suggesting a local 
conformational preference for this motif. This observation of a specific 
structural single and/or repeated motif is in agreement with the circular 
dichroism spectra of the involved (VGVAPG) 1, (VGVAPG) 2 and (VGVAPG) 3 
peptides and then it can be proposed that their biological activities have 
to be linear. The final aim of this type of work is to understand more 
about the sequence/structure/function/activity relationships of those 
structured peptides in order to propose specific sequences (corresponding 
to specific structures) for best biological activity results. 
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AB Considerable recent progress has been made in the field of ab initio 

protein structure prediction, as witnessed by 

the third Critical Assessment of Structure Prediction 

(CASP3) . In spite of this progress, much work remains, for the field has 
yet to produce consistently reliable ab initio structure 
prediction protocols. In this work, we review the features of 
current ab initio protocols in an attempt to highlight the foundations of 
recent progress in the field and suggest promising directions for future 
work. 

L4 ANSWER 14 OF 96 MEDLINE 

AN 2001245538 MEDLINE 

DN 21109762 PubMed ID: 11179902 

TI Integration of genome data and protein structures: 
prediction of protein folds, protein 
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AB With the massive amount of sequence and structural data being produced, 

new avenues emerge for exploiting the information therein for applications 

in several fields. Fold distributions can be mapped onto entire 

genomes to learn about the nature of the protein universe and 

many of the interactions between proteins can now be predicted 

solely on the basis of the genomic context of their genes. Furthermore, by 

utilising the new incoming data on single nucleotide polymorphisms by 

mapping them onto three-dimensional structures of proteins, 

problems concerning population, medical and evolutionary genetics can be 

addressed . 
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AB The unique folded structure makes a polypeptide a 

functional protein. The number of known sequences is about a 

hundred times larger than the number of known structures and the gap is 

increasing rapidly. The primary goal of all structure 

prediction methods is to obtain structure-related information on 

proteins, whose structures have not been determined 

experimentally. Besides this goal, the development of accurate prediction 
methods helps to reveal principles of protein folding. 
Here we present a brief survey of protein structure 
predictions based on statistical analyses of known sequence and 
structure data. We discuss the background of these methods and attempt to 
elucidate principles, which govern structure formation of soluble and 
membrane proteins . 



L4 ANSWER 16 OF 96 MEDLINE 

AN 2001700314 MEDLINE 

DN 21615649 PubMed ID: 11747907 

TI The architecture of parallel beta-helices and related folds. 
AU Jenkins J; Pickersgill R 

CS Institute of Food Research, Norwich Research Park, Colney Lane, Norwich 

NR4 7UA, UK., john.jenkins@bbsrc.ac.uk 
SO PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY, (2001 Oct) 77 (2) 111-75. 

Ref: 198 

Journal code: 0401233. ISSN: 0079-6107. 
CY England: United Kingdom 
DT Journal; Article; (JOURNAL ARTICLE) 
General Review; (REVIEW) 

(REVIEW, ACADEMIC) 
LA English 
FS Priority Journals 
EM 200203 

ED Entered STN: 20011219 

Last Updated on STN: 20020305 
Entered Medline: 20020304 

AB Three-dimensional structures have been determined of a large number of 
proteins characterized by a repetitive fold where each 
of the repeats (coils) supplies a strand to one or more parallel 
beta-sheets. Some of these proteins form superf amilies of 
proteins, which have probably arisen by divergent evolution from a 
common ancestor. The classical example is the family including four 
families of pectinases without obviously related primary sequences, the 
phage P22 tailspike endorhamnosidase, chrondroitinase B and possibly 
pertactin from Bordetella pertusis. These show extensive stacking of 
similar residues to give aliphatic, aromatic and polar stacks such as the 
asparagine ladder. This suggests that coils can be added or removed by 
duplication or deletion of the DNA corresponding to one or more coils and 
explains how homologous proteins can have different numbers of 
coils. This process can also account for the evolution of other families of 
proteins such as the beta-rolls, the leucine-rich repeat 
proteins, the hexapeptide repeat family, two separate families of 
beta-helical antifreeze proteins and the spiral folds. 

These families need not be related to each other but will share features 
such as relative untwisted beta-sheets, stacking of similar residues and 



turns between beta-strands of approximately 90 degrees often stabilized by 
hydrogen bonding along the direction of the parallel beta-helix. Repetitive 
folds present special problems in the comparison of structures but 
offer attractive targets for structure prediction. The 

stacking of similar residues on a flat parallel beta-sheet may account for 
the formation of amyloid with beta-strands at right-angles to the fibril 
axis from many unrelated peptides. 
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AB As the number of completely sequenced genomes rapidly increases, including 
now the complete Human Genome sequence, the post -genomic problems of 
genome-scale protein structure determination and the issue of 
gene function identification become ever more pressing. In fact, these 
problems can be seen as interrelated in that experimentally determining or 
predicting or the structure of proteins 

encoded by genes of interest is one possible means to glean subtle hints 
as to the functions of these genes. The applicability of this approach to 
gene characterisation is reviewed, along with a brief survey of the 
reliability of large-scale protein structure 

prediction methods and the prospects for the development of new 
prediction methods. 
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AB The prediction of protein structure, based 

primarily on sequence and structure homology, has become an increasingly 

important activity. Homology models have become more accurate and their 



range of applicability has increased. Progress has come, in part, from the 
flood of sequence and structure information that has appeared over the 
past few years, and also from improvements in analysis tools. These 
include profile methods for sequence searches, the use of 
three-dimensional structure information in sequence alignment and new 
homology modeling tools, specifically in the prediction of loop and 
side-chain conformations. There have also been important advances m 
understanding the physical chemical basis of protein stability 
and the corresponding use of physical chemical potential functions to 
identify correctly folded from incorrectly folded 
protein conformations. 
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AB The fastest simple, single domain proteins fold a 

million times more rapidly than the slowest. Ultimately this broad kinetic 

spectrum is determined by the amino acid sequences that define these 

proteins, suggesting that the mechanisms that underlie 

folding may be almost as complex as the sequences that encode 

them. Here, however, we summarize recent experimental results which 

suggest that (1) despite a vast diversity of structures and functions, 

there are fundamental similarities in the folding mechanisms of 

single domain proteins and (2) rather than being highly 

sensitive to the finest details of sequence, their folding 

kinetics are determined primarily by the large-scale, redundant features 

of sequence that determine a protein's gross structural 

properties. That folding kinetics can be predicted 

using simple, empirical, structure-based rules suggests that the 

fundamental physics underlying folding may be quite 

straightforward and that a general and quantitative theory of 

protein folding rates and mechanisms (as opposed to 

unfolding rates and thus protein stability) may be near on the 

horizon . 
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AB nAChRs are pentameric transmembrane proteins into the 

superfamily of ligand-gated ion channels that includes the 5HT3 , glycine, 
GABAA, and GABAC receptors. Electron microscopy, affinity labeling, and 
mutagenesis experiments, together with secondary structure 
predictions and measurements, suggest an all -beta folding 
of the N-terminal extracellular domain, with the connecting loops 
contributing to the ACh binding pocket and to the subunit interfaces that 
mediate the allosteric transitions between conformational states. The ion 
channel consists of two distinct elements symmetrically organized along 
the fivefold axis of the molecule: a barrel of five M2 helices, and on the 
cytoplasmic side five loops contributing to the selectivity filter. The 
allosteric transitions of the protein underlying the 

physiological ACh-evoked activation and desensitization possibly involve 
rigid body motion of the extracellular domain of each subunit, linked to a 
global reorganization of the transmembrane domain responsible for channel 
gating. 
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AB The achievements in the structural characterization in solution, through 
NMR spectroscopy, of proteins containing metal ions are reviewed 
and discussed. We call this branch "inorganic structural biology". The 
results of this approach are presented here for cytochrome b5, used in 
this paper as a case system. These results are discussed particularly in 
the light of their relevance for understanding the biological function of 
the proteins. Furthermore, the extension of the characterization 
to the internal motions and to the folding/unfolding processes, 
as well as the development of tools for structure 
prediction, are critically presented. The message is that the 
complete characterization of a biological molecule cannot be limited to a 
static description of the structure but it should go beyond, analyzing the 
internal motions occurring at various time scales as well as the behavior 
in different conditions, such as in the presence of denaturing agents. 



L4 ANSWER 22 OF 96 MEDLINE 
AN 2001041026 MEDLINE 



DN 20400938 PubMed ID: 10940251 

TI Comparative protein structure modeling of genes and genomes. 

AU Marti-Renom M A; Stuart A C; Fiser A; Sanchez R; Melo F; Sali A 

CS Laboratories of Molecular Biophysics, Pels Family Center for Biochemistry 

and Structural Biology, Rockefeller University, New York, NY 10021, USA. 

NC GM 54762 (NIGMS) t x _ oor _ 

SO ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE, (2000) 29 291-325. 

Ref: 213 

Journal code: 9211097. ISSN: 1056-8700. 
CY United States 

DT Journal; Article; (JOURNAL ARTICLE) 
General Review; (REVIEW) 

(REVIEW, ACADEMIC) 
LA English 
FS Priority Journals 
EM 200012 

ED Entered STN : 20010322 

Last Updated on STN: 20010322 
Entered Medline: 20001207 

AB Comparative modeling predicts the three-dimensional 
structure of a given protein sequence (target) based 
primarily on its alignment to one or more proteins of known 
structure (templates) . The prediction process consists 
of fold assignment, target-template alignment, model building, 
and model evaluation. The number of protein sequences that can 
be modeled and the accuracy of the predictions are increasing steadily 
because of the growth in the number of known protein structures 
and because of the improvements in the modeling software. Further advances 
are necessary in recognizing weak sequence-structure similarities, 
aligning sequences with structures, modeling of rigid body shifts, 
distortions, loops and side chains, as well as detecting errors in a 
model. Despite these problems, it is currently possible to model with 
useful accuracy significant parts of approximately one third of all known 
protein sequences. The use of individual comparative models in 
biology is already rewarding and increasingly widespread. A major new 
challenge for comparative modeling is the integration of it with the 
torrents of data from genome sequencing projects as well as from 
functional and structural genomics. In particular, there is a need to 
develop an automated, rapid, robust, sensitive, and accurate comparative 
modeling pipeline applicable to whole genomes. Such large-scale modeling 
is likely to encourage new kinds of applications for the many resulting 
models, based on their large number and completeness at the level of the 
family, organism, or functional network. 
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AB Structural genomics projects aim to solve the experimental structures of 

all possible protein folds. Such projects entail a 




conceptual shift from traditional structural biology in which structural 

information is obtained on known proteins to one in which the 

structure of a protein is determined first and the function 

assigned only later. Whereas the goal of converting protein 

structure into function can be accomplished by traditional sequence 

motif -based approaches, recent studies have shown that assignment of a 

protein's biochemical function can also be achieved by scanning 

its structure for a match to the geometry and chemical identity of a known 

active site. Importantly, this approach can use low-resolution structures 

provided by contemporary structure prediction methods. 

When applied to genomes, structural information (either experimental or 
predicted) is likely to play an important role in high- throughput function 
assignment . 
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AB Ab initio protein folding methods have been developing 

rapidly over the past few years and, at the last Critical assessment of 

methods of protein structure prediction 

(CASP) meeting, it was shown that important progress has been made in 
generating structure from sequence. Both methods based on statistical 
potentials and methods using physics -based potentials have shown 
improvements. Most current methods use statistics -based potentials and the 
development of these is ongoing. Additionally, the inclusion of multiple 
sequence data in the algorithms in order to aid in finding the native 
structure is a common theme. The use of physics-based potentials is less 
developed, which means that less progress has been made in understanding 
why a sequence forms a structure . 
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AB Protein structure prediction, fold 

recognition, homology modeling and design rely mainly on statistical 
effective energy functions. Although the theoretical foundation of such 
functions is not clear, their usefulness has been demonstrated in many 
applications. Molecular mechanics force fields, particularly when 
augmented by implicit solvation models, provide physical effective energy 
functions that are beginning to play a role in this area. 
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AB Structural energetics is a method for calculating the energetics of 

protein folding and binding reactions as a function of 

temperature. This approach allows measured energetics to be interpreted 
with regards to the protein structure and the 

prediction of energetics from known structures. Recent advances 

include improvements in the parameterization of enthalpy, entropy and heat 

capacity terms and new applications, especially with regards to 

understanding dynamic properties of proteins and how these are 

affected by ligand binding. 
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AB Protein crystallography has become a major technique for 

understanding cellular processes. This has come about through great 
advances in the technology of data collection and interpretation, 
particularly the use of synchrotron radiation. The ability to express 
eukaryotic genes in Escherichia coli is also important. Analysis of known 
structures shows that all proteins are built from about 1000 
primeval folds. The collection of all primeval folds 
provides a basis for predicting structure from 




sequence. At present about 450 are known. Of the presently sequenced 
genomes only a fraction can be related to known proteins on the 
basis of sequence alone. Attempts are being made to determine all (or as 
many as possible) of the structures from some bacterial genomes in the 
expectation that structure will point to function more reliably than does 
sequence. Membrane proteins present a special problem. The next 
20 years may see the experimental determination of another 40,000 
protein structures. This will make considerable demands on 
synchrotron sources and will require many more biochemists than are 
currently available. The availability of massive structure databases will 
alter the way biochemistry is done. 
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AB Finding the optimal solution to a complex optimization problem is of great 

importance in many fields, ranging from protein 

structure prediction to the design of microprocessor 

circuitry. Some recent progress in finding the global minima of potential 
energy functions is described, focusing on applications of the simple 
"basin-hopping" approach to atomic and molecular clusters and more 
complicated hypersurface deformation techniques for crystals and 
biomolecules. These methods have produced promising results and should 
enable larger and more complex systems to be treated in the future. 
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AB The current state of the art in modeling protein structure has 

been assessed, based on the results of the CASP (Critical Assessment of 

protein Structure Prediction) experiments. In 

comparative modeling, improvements have been made in sequence alignment, 



sidechain orientation and loop building. Refinement of the models remains 
a serious challenge. Improved sequence profile methods have had a large 
impact in fold recognition. Although there has been some 

progress in alignment quality, this factor still limits model usefulness. 
In ab initio structure prediction, there has been 

notable progress in building approximately correct structures of 40-60 
residue-long protein fragments. There is still a long way to go 
before the general ab initio prediction problem is solved. Overall, the 
field is maturing into a practical technology, able to deliver useful 
models for a large number of sequences. 
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AB The third comparative assessment of techniques of protein 

structure prediction (CASP3) was held during 1998. This 

is a blind trial in which structures are predicted 

prior to having knowledge of the coordinates, which are then revealed to 
enable the assessment. Three sections at the meeting evaluated different 
methodologies - comparative modelling, fold recognition and ab 
initio methods. For some, but not all of the target coordinates, high 
quality models were submitted in each of these sections. There have been 
improvements in prediction techniques since CASP2 in 1996, most notably 
for ab initio methods. 
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AB A major challenge in the post-genome era will be determination of the 
functions of the encoded protein sequences. Since it is 
generally assumed that the function of a protein is closely 
linked to its three-dimensional structure, prediction 
or experimental determination of the library of protein 

structures is a matter of high priority. However, a large proportion of 

gene sequences appear to code not for folded, globular 

proteins, but for long stretches of amino acids that are likely to 

be either unfolded in solution or adopt non-globular structures of unknown 

conformation. Characterization of the conformational propensities and 

function of the non-globular protein sequences represents a 

major challenge. The high proportion of these sequences in the genomes of 

all organisms studied to date argues for important, as yet unknown 

functions, since there could be no other reason for their persistence 

throughout evolution. Clearly the assumption that a folded 

three-dimensional structure is necessary for function needs to be 

re _ examined. Although the functions of many proteins are 

directly related to their three-dimensional structures, numerous 

proteins that lack intrinsic globular structure under 

physiological conditions have now been recognized. Such proteins 

are frequently involved in some of the most important regulatory functions 

in the cell, and the lack of intrinsic structure in many cases is relieved 

when the protein binds to its target molecule. The intrinsic 

lack of structure can confer functional advantages on a protein, 

including the ability to bind to several different targets. It also allows 

precise control over the thermodynamics of the binding process and 

provides a simple mechanism for inducibility by phosphorylation or through 

interaction with other components of the cellular machinery. Numerous 

examples of domains that are unstructured in solution but which become 

structured upon binding to the target have been noted in the areas of cell 

cycle control and both transcriptional and translational regulation, and 

unstructured domains are present in proteins that are targeted 

for rapid destruction. Since such proteins participate in 

critical cellular control mechanisms, it appears likely that their rapid 

turnover, aided by their unstructured nature in the unbound state, 

provides a level of control that allows rapid and accurate responses of 

the cell to changing environmental conditions. 
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AB Stably folded membrane proteins reside in a free 

energy minimum determined by the interactions of the peptide chains with 
each other, the lipid bilayer hydrocarbon core, the bilayer interface, and 
with water. The prediction of three-dimensional 

structure from sequence requires a detailed understanding of these 

interactions. Progress toward this objective is summarized in this review 

by means of a thermodynamic framework for describing membrane 

protein folding and stability. The framework includes a 

coherent thermodynamic formalism for determining and describing the 

energetics of peptide-bilayer interactions and a review of the properties 

of the environment of membrane proteins- -the bilayer milieu. 

Using a four -step thermodynamic cycle as a guide, advances in three main 

aspects of membrane protein folding energetics are 

discussed: protein binding and folding in bilayer 

interfaces, transmembrane helix insertion, and helix-helix interactions. 
The concepts of membrane protein stability that emerge provide 
insights to fundamental issues of protein folding. 
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AB This article is a personal perspective on the developments in the field of 
protein folding over approximately the last 40 years. In 
addition to its historical aspects, the article presents a view of the 
principles of protein folding with particular emphasis 
on the relationship of these principles to the problem of protein 
structure prediction. It is argued that despite much 
that is new, the essential elements of our current understanding of 
protein folding were anticipated by researchers many 
years ago. These elements include the recognition of the central 
importance of the polypeptide backbone as a determinant of 
protein conformation, hierarchical protein 
folding, and multiple folding pathways. Important areas 
of progress include a detailed characterization of the folding 
pathways of a number of proteins and a fundamental understanding 
of the physical chemical forces that determine protein 
stability. Despite these developments, fold prediction 
algorithms still encounter difficulties in identifying the correct 
fold for a given sequence. This may be due to the possibility that 
the free energy differences between at least a few alternate conformations 
of many proteins are not large. Significant progress in 
protein structure prediction has been due 

primarily to the explosive growth of sequence and structural databases. 
However, further progress is likely to depend in part on the ability to 
combine information available from databases with principles and 
algorithms derived from physical chemical studies of protein 
folding. An approach to the integration of the two areas is 




outlined with specific reference to the PrISM program that is a fully 
integrated sequence/structural-analysis/ fold- 
recognition/homology model building software system. 
Copyright 1999 Academic Press. 
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AB We describe the RNA folding problem and contrast it with the 
much more difficult protein folding problem. RNA has 
four similar monomer units, whereas proteins have 20 very 
different residues. The folding of RNA is hierarchical in that 
secondary structure is much more stable than tertiary folding. 
In RNA the two levels of folding (secondary and tertiary) can be 
experimentally separated by the presence or absence of Mg2+. Secondary 
structure can be predicted successfully from 

experimental thermodynamic data on secondary structure elements: helices, 

loops, and bulges. Tertiary interactions can then be added without much 

distortion of the secondary structure. These observations suggest a 

folding algorithm to predict the structure of 

an RNA from its sequence. However, to solve the RNA folding 

problem one needs thermodynamic data on tertiary structure interactions, 

and identification and characterization of metal-ion binding sites. These 

data, together with force versus extension measurements on single RNA 

molecules, should provide the information necessary to test and refine the 

proposed algorithm. 

Copyright 1999 Academic Press. 
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AB Assigning three-dimensional protein folds to genome 

sequences is essential to understanding protein function. 

Although experimental three-dimensional structures are currently availabl< 
for only a very small fraction of these sequences, computational 
fold assignment is able to assign folds to 20-30% of the 
sequences in various genomes. This percentage varies depending on the 
particular organism under analysis, on the sensitivities of the methods 
used and on the number of experimental structures available at the time 
the assignment is carried out. The fraction of assignable sequences is 
currently increasing at an annual rate of roughly 18%. If this rate is 
sustained throughout the coming years, three-dimensional computational 
models for more than half of the genome sequences may be available by the 
year 2003 . 
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AB Spectacular achievements in whole genome sequencing open up new 
possibilities for structural research. Protein structures can 
now be studied in their natural genomic context. On the other hand, 
structure prediction algorithms can be improved using 
species-specific tendencies in folding patterns. Finally, 

efficient strategies to select targets for structure determination can be 
devised. In this review we consider new computational approaches and 
results in protein structure analysis stemming from the 
availability of complete genomes. 




L4 ANSWER 38 OF 96 MEDLINE 

AN 1999089220 MEDLINE 

DN 99089220 PubMed ID: 9872054 

TI Contemporary approaches to protein structure classification. 

AU Swindells M B; Orengo C A; Jones D T; Hutchinson E G; Thornton J M 

CS Helix Research Institute, Kisarazu, Japan. 

SO BIOESSAYS, (1998 Nov) 20 (11) 884-91. Ref : 53 

Journal code: 8510851. ISSN: 0265-9247. 
CY ENGLAND: United Kingdom 
DT Journal; Article; (JOURNAL ARTICLE) 
General Review; (REVIEW) 
(REVIEW, TUTORIAL) 
LA English 

FS Priority Journals; Space Life Sciences 
EM 199901 

ED Entered STN: 199902 02 

Last Updated on STN : 20000303 
Entered Medline: 19990121 

AB In a similar manner to sequence database searching, it is also possible 1 
compare three-dimensional protein structure. Such methods can be 
extremely useful because a structural similarity may represent a distant 
evolutionary relationship that is undetectable by sequence analysis. In 
this review, we summarise the most popular structure comparison methods, 
show how they can be used for database searching, and then describe some 
of the most advanced attempts to develop comprehensive protein 
structure classifications. With such data, it is possible to identify 
distant evolutionary relationships, provide libraries of unique 
folds for structure prediction, estimate the 

total number of folds that exist, and investigate the preference 
for certain types of structures over others. 
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AB The detection of homologous protein sequences frequently 

provides useful predictions of function and structure. 

Methods for homology searching have continued to improve, ^ such that very 
distant evolutionary relationships can now be detected. Little attention 
has been paid, however, to the problems of detecting homology when domains 
are inserted or permuted. Here we review recent occurrences of these 
phenomena and discuss methods that permit their detection. 
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AB Protein folding appears to be almost too complex for a 

complete description or for accurate structure 

prediction from sequence data. A simple way of analysing local 
interactions, however, bears promise of linking theory with experiment and 
cutting through some of the complexities. 
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